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We investigate a stationary process's crypticity — a measure of the difference between its hidden 
state information and its observed information — using the causal states of computational mechanics. 
Here, we motivate crypticity and cryptic order as physicaUy meaningful quantities that monitor 
how hidden a hidden process is. This is done by recasting previous results on the convergence 
of block entropy and block-state entropy in a geometric setting, one that is more intuitive and 
that leads to a number of new results. For example, we connect crypticity to how an observer 
synchronizes to a process. We show that the block-causal-state entropy is a convex function of block 
length. We give a complete analysis of spin chains. We present a classification scheme that surveys 
stationary processes in terms of their possible cryptic and Markov orders. We illustrate related 
entropy convergence behaviors using a new form of foliated information diagram. Finally, along 
the way, we provide a variety of interpretations of crypticity and cryptic order to establish their 
naturalness and pervasiveness. Hopefully, these will inspire new applications in spatially extended 
and network dynamical systems. 
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A black box is a metaphor for ignorance: One 
cannot see inside, but the presumption is that 
something, unknown in whole or in part, is there 
to be discovered. Moreover, the conceit is that 
the impoverished outputs from the box do con- 
tain something partially informative. Physically, 
ignorance comes in the act of measurement — 
measurements that are generically incomplete, in- 
accurate, and infrequent. Since measurements 
dictate that one can have only a partial view, it 
goes without saying that these distortions make 
discovery both difficult and one of the key chal- 
lenges to scientific methodology. Measurement 
necessarily leads to our viewing the world as be- 
ing hidden from us. Of course, the world is 
not completely hidden. If it were, then there 
would be neither gain nor motivation to probing 
measurements to build models. Scientific theory 
building and its experimental verification oper- 
ate, then, in the framework of hidden processes — 
processes from which we have observations from 
which, in turn, we attempt to understand the hid- 
den mechanisms. At least philosophically, this 
setting is not even remotely new. The circum- 
stance is that addressed by Plato's metaphor of 
our knowledge of the world deriving from the data 
of shadows on a cave wall. 

Fortunately, we are far beyond metaphors these 
days. Hidden processes pose a quantitative ques- 
tion: How hidden are they? Here, we show how 
to quantitatively measure just this: How much in- 
ternal information is hidden by measuring a pro- 
cess? Of course, this assumes, as in the black box 
metaphor, that there is something to be discov- 
ered. The tool we use to ground the intentional 
stance of discovering the internal mechanisms — 
to say what is hidden — is computational mechan- 
ics. Computational mechanics is a theory of what 
patterns are and how to measure a hidden pro- 
cess's degree of structure and organization. Com- 
putational mechanics has a long history, though, 
going back to the original challenges of nonlin- 
ear modeling posed in the 1970s that led to the 
concept of reconstructing "geometry from a time 
series" . The explorations here can be seen in this 
light, with one important difference: Computa- 
tional mechanics shows that measurements of a 
hidden process tell how the process's internal or- 
ganization should be represented. Building on 
this, we develop a quantitative theory of how hid- 
den processes are. 



I. INTRODUCTION 

Many scientific domains face tlie confounding problems 
of defining and measuring information processing in dy- 
namical systems. These range from technology to funda- 
mental science and, even, epistemology of science jT]: 

1. The 2020 Digital Roadblock: The end of Moore's 
scaling laws for microelectronics [SHI]. 

2. The Central Dogma of Neurobiology: How are the 
intricate physical, biochemical, and biological com- 
ponents structured and coordinated to support nat- 
ural, intrinsic neural computing? 

3. Physical Intelligence: Does intelligence require bi- 
ology, though? Or can there be alternative nonbio- 
logical substrates which support system behaviors 
that are to some degree "smart" . 

4. Structure versus Function: Intelligence aside, how 
do we define and detect spontaneous organization, 
in the first place? How do these emergent patterns 
take on and support functionality? 

Many have worked to quantify various aspects of infor- 
mation dynamics; cf. Ref. [5]. One often finds references 
to information storage, transfer, and processing. Sophis- 
ticated measures are devised to characterize these quanti- 
ties in multidimensional settings, including networks and 
adaptive systems. 

Here, we investigate foundational questions that bear 
on all these domains, using methods with very few 
modeling and representation requirements attached that, 
nonetheless, allow a good deal of progress. In quan- 
tifying information processing in stochastic dynamical 
systems, two measures have repeatedly appeared and 
been successfully applied: the past-future mutual infor- 
mation of observations (excess entropy) E 6 , and refer- 
ences therein] and the internal stored information (sta- 
tistical complexity) [7]. Curiously, the difference be- 
tween these measures — the crypticity x [H] — has only re- 
cently received attention. To our knowledge, the first 
attempt to understand x directly was in Ref. [S]. The 
following provides additional perspective and clarity to 
the results contained there and in the related works of 
Refs. [H [ini HJ. In particular, we add to the body of 
knowledge surrounding crypticity and cryptic order, de- 
velop a further classification of the space of processes, 
and introduce several alternative ways to visualize these 
concepts. An appendix demonstrates that crypticity cap- 
tures a notable and unique property, when compared to 
alternative information measures. The goal is to pro- 
vide a more intuitive and geometric toolbox for pos- 
ing and answering the increasing range and increasingly 
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more complex research challenges surrounding informa- 
tion processing in nature and technology. 

II. DEFINITIONS 

We denote contiguous groups of random variables Xi 
using Xn:m+i = Xn ■ ■ ■ X„i. A scmi- infinite group is de- 
noted either Xn-, = X„X„+i ... or X„ = . . . X„_2X„_i. 
We refer to these as the future and the past, respectively. 
Consistent with this, the bi-infinite chain of random vari- 
ables is denoted X. . A process is specified by the distri- 
bution Pi-{X-). Throughout the following, we assume we 
are given a stationary process. 

Please refer to Refs. [SI [T^] for supplementary defini- 
tions of presentations, causal states, e-machines, unifi- 
larity, co-unifilarity. Shannon block information, infor- 
mation diagrams, and the like. The following assumes 
familiarity with these concepts and the results and tech- 
niques there. However, our development calls for a few 
reminders. 

There are two notions of memory central to character- 
izing stochastic processes. These are the excess entropy 
E (sometimes called the predictive information) and the 
statistical complexity C^. The excess entropy is a mea- 
sure of correlation between the past and future: the de- 
gree to which one can remove uncertainty in the future 
given knowledge of the past. (This is illustrated as the 
green information atom at the intersection the past and 
future in the information diagram of Fig.[lj) The statis- 
tical complexity is a quantity that arises in the context 
of modeling rather than prediction. Specifically, is 
the amount of information required for an observer to 
synchronize a stochastic process. In the setting of finite- 
state hidden Markov models, it is the information stored 
in the process's causal states. 

Then, we have the crypticity: 

Deflnition 1. A process's crypticity x is defined as: 
X^H[So\Xo..] , 

where St is a process's causal state at time t. 

Clearly, the definition relies on having a process's 
e-machine presentation; the states used are causal states. 
Other presentations, whose alternative states we denote 
TZ, suggest an analogous, but more general definition of 
crypticity; cf. Ref. [12 . 

To give us something to temporarily hang our hat on, 
it turns out that the crypticity is simply how much stored 
information is hidden from observations. That is, it is the 
difference between the internal stored information (C^) 
and the apparent past-future mutual information (E). 
This is directly illustrated in Fig. [l] 



= H[S] 




FIG. 1. Crypticity x is represented by the red (dark) crescent 
shape in this e-machine I-diagram. The excess entropy E, 
by the (green) overlap of the past information H[X.q\ and 
future information il/[Xo:]. The statistical complexity is 
the information in the internal causal states S and comprises 
both X s-nd E. For a review of information measures and 
diagrams refer to the citations given in the text or quickly 
read the first portions of Sec. |IX[ 

We are also interested in the range required to "learn" 
the crypticity. This is the cryptic order. 

Definition 2. A process's cryptic order k is defined as: 
k = mm{L £ Z+ : H[Sl\Xq.] = 0} . 

These definitions do not easily admit an intuitive inter- 
pretation. Their connection to hidden stored information 
is not immediately clear, for example. They mask the im- 
portance and centrality of the crypticity property. Given 
this, we devote some effort in the following to motivate 
them and to give several supplementary interpretations. 
As a start, Fig. [l] gives a graphical definition of cryptic- 
ity using the e-machine information diagram of Ref. [H] . 
It is the red crescent highlighted there, which is the state 
information = H[S] minus that information derivable 
from the future i/[Xo] = H[Xq-]. This begins to explain 
crypticity as a measure of a process's hidden-ness. We'll 
return to this, but first let's consider several other alter- 
natives. 

III. CRYPTICITY: FROM STATE PATHS TO 
SYNCHRONIZATION 

Crypticity and, in particular, cryptic order have 
straightforward interpretations when one considers the 
internal state-paths taken as an observer synchronizes 
to a process |13) . In this, cryptic order is seen to be 
analogous to, and potentially simpler than, a process's 
Markov order. While both the Markov and cryptic or- 
ders derive from a notion of synchronization, the cryp- 
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tic order depends on a subset of the paths reahzed dur- 
ing synchronization. We iUustrate this via an example: 
The {R, fc)-Golden Mean Process — a generalization of the 
Golden Mean Process with tunable Markov order R and 
tunable cryptic order k. In particular, we examine the 
(3, 2)-Golden Mean Process shown in Fig. [2] 
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FIG. 2. The (3, 2)-Golden Mean e-machine: Markov order 3 
and cryptic order 2. 

It is straightforward to verify that the only 
words of length 3 generated by this process are 
{000, 001, Oil, 100, 110, 111}. Since the process is Markov 
order 3 (by construction) we know that each of these 
words is a synchronizing word |14j . Some words lead to 
synchronization in fewer than three steps, though. For 
instance. Oil yields synchronization to state E after just 
the first two symbols 01. 

In Fig. |3j we display the internal-state paths taken by 
each possible initial state under evolution governed by 
the six synchronizing words. Let's take a moment to 
describe these illustrations carefully. Before reading any 
word, there is maximum uncertainty in the internal state. 
We represent this using a circle for each of the five causal 
states of the e-machine. Each of these states is led to a 
next state by following the first symbol seen [15]. For 
word 001, the first symbol is 0, and A, for instance, is 
led to B. Notice that E is not led to any state. This 
is because E has no outgoing transition with symbol 0. 
The path from E, therefore, ends and is not considered 
further. The termination of paths is one of the important 
features of synchronization to note. 

Looking at the synchronizing word 100, we see that the 
transition on the first symbol 1 takes both states A and E 
to the same state A. Since we use unifilar presentations 
(e-machines), this merging can never be undone. Path 
merging is yet another important feature. 

Both the termination and merging of paths are rele- 
vant to synchronization, but have different roles in the 
determination of the Markov and cryptic orders. 

Although we already know the Markov order of this 
process, we can read it from Fig. [3] by looking at the 
lengths for each word where only one path remains. 
These lengths {3, 3, 2, 2, 2, 2} are marked with orange dia- 



monds. The maximum value of this length is the Markov 
order (3, in this example). 

000 001 oil 100 110 111 



FIG. 3. Synchronization paths for (3, 2)-Golden Mean 
e-machine: Each synchronizing word induces a set of state- 
paths; some of which terminate, some of which merge. 



In the next illustration. Fig. |4j we keep only those 
paths that do not terminate early. In this way, we remove 
paths that generally are quite long, but that terminate 
before having the chance to merge with the final synchro- 
nizing paths. We similarly mark, with green triangles, 
the lengths where these reduced paths have ultimately 
merged. Note that restricting paths can only preserve or 
decrease each length. Finally, in analogy to the Markov 
order, the maximum of these lengths {0,0,0,1,2,2} is the 
cryptic order (2 in this example). 








FIG. 4. The paths that are not terminated before the Markov 
order are highlighted in red. These are the paths relevant for 
the cryptic order. For each word the contribution to Markov 
order is still indicated by an orange diamond, whereas the 
contribution toward cryptic order is indicated by a green tri- 
angle. 



This demonstrates how crypticity relates to paths and 
path merging. It is a small step then to ask for a direct 
connection to co-unifilarity [5|: ^[iSolXo^i] = 0. In fact, 
there are three primary equivalent statements about a 
process: (i) its e-machine being co-unifilar, (ii) its x = 0, 
and (iii) its cryptic order fc = 0. (Appendix [C] presents 
a proof of this equivalence in terms of entropy growth 
functions and includes the connection to cryptic order as 
well.) 

This exposes the elementary nature of the cryptic order 
as a property of synchronizing paths. Appendix [B] goes 
further to show that state-paths traced similarly, but in 
the reverse time direction, are the same as those singled- 
out in the forward direction, as just done. The remainder 
of this section offers different perspectives on crypticity, 
some of which are less strict, but provide intuition and 
suggest its broad applicability. 
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A. Global versus Local 

Imagine a synchronization task involving a group of 
agents. The agents begin in different locations (states) 
and move to next locations based on the synchronization 
input they receive from a common controller. The goal is 
to provide a uniform input that causes (a subset of) the 
agents to arrive at the same location. This is reminis- 
cent of a road-coloring problem. In many road-coloring 
contexts, only uniform-degree graph structures are inves- 
tigated, largely due to theoretical tractability. However, 
real- world graphs are rarely uniform degree. This means 
that some agents may receive instructions that they can- 
not carry out. These agents quit, and their paths are 
terminated. Assuming that the instructions are synchro- 
nizing for some subset of the agents (the instruction is 
a synchronizing word), the synchronization task will end 
with this subset of agents at the desired destination. 

There are two ways in which we may view this process. 
One is global, and corresponds to the Markov order, while 
the other is local and corresponds to the cryptic order. 

If we monitor the entire collection of agents from a 
bird's eye view evolving under the synchronization input, 
we observe paths terminating and merging. Our global 
notion of synchronization is the point at which each path 
is either terminated or merged with every other valid 
path. This is clearly coincident with the description of 
Markov order previous described. 

Alternatively, we monitor the collective by querying 
the agents after the task is complete. The unsuccess- 
ful agents, whose paths were terminated, never arriving 
at the destination, cannot be queried. From this view- 
point, synchronization takes place relative to the group 
of agents that were not terminated. As locally inter- 
acting entities, they know the latest time at which an 
agent merged with their group — the group which ulti- 
mately synchronized. Even after this event, there may be 
other agents still operating that will inevitably be termi- 
nated at some later time. This means that from the local 
(agent) perspective, synchronization may happen earlier 
than from the global (controller) perspective. 

We claim, based on this setting, that the cryptic order 
has a straightforward and physically relevant basis in the 
context of synchronization. Upcoming discussions, some 
more technical, will emphasize this point further, as well 
as demonstrate new results. 



B. Mazes and Stacks 

The Markov versus cryptic order distinction is relevant 
to any maze-solving algorithm jl6j . Imagining the solu- 
tion of a maze as a sequence of moves — left, right, or 



straight — we may write down a list of potential solutions 
(which must contain all actual solutions) by listing all 
3^ sequences [I^. A brute-force algorithm tries all of 
these paths. Since we are interested in worst-case scenar- 
ios, many of the details (e.g., depth- versus breadth-first 
search) are not relevant. What is relevant is the object 
that the algorithm must maintain in memory or that it 
ultimately returns to the user. 

An algorithm might try out each potential solution, 
feeding in each move sequentially and testing for either 
maze completion or termination (walking into a wall or a 
previously visited location) at each step. The end of each 
solution is marked with a length. When all solutions have 
been tried, this set of solutions and lengths is returned. 
While this is not a stationary stochastic process, we may 
think of the longest of these lengths as being similar to 
the Markov order. The speed and memory use of this 
algorithm are obviously improved by using a tree struc- 
ture, but this does not affect the result we are interested 
in. 

If we were only interested in paths which end in maze 
completion, an even more memory-conscious algorithm 
would realize that dead-ends in the tree could be re- 
moved. One accomplishes this with a stack memory for 
the active-path tree branch. Reaching a nonsolving ter- 
mination, the algorithm pops the end states until return- 
ing to the most recent unexplored option. This process 
continues recursively until the tree has been filled out. 
The relevant lengths are now the lengths of the maze- 
completing paths (all root-to-leaf paths), the longest of 
which is an analog of the cryptic order. 



C. Transient versus Relaxed 

Rather than using the global versus local distinction, 
we can think in terms of a dynamical view of synchroniza- 
tion. We might imagine a collection of ants attempting 
to create paths from a resource-rich region to their nest; 
or a watershed in the process of forming the transport 
network from collection regions to the main body of wa- 
ter. Until these networks develop, it is not clear which 
will become the important paths. 

A log not worth climbing over causes ants to make 
the effort less often, thereby dropping less pheromone, 
leading fewer ants to attempt this path, until finally it is 
empty. Similarly, slow water deposits more sediment and 
fills underused channels. As these networks evolve from 
an initial transitory state to relaxed state, the types of 
paths within the network and their synchronization prop- 
erties change. In particular, while the early-time synchro- 
nization depends on the terminating paths, the later-time 
synchronization will not. In this dynamical picture we 



6 



see that a property akin to cryptic order emerges as the 
system evolves. 

D. Naive versus Informed 

It is only a small step from this dynamical picture to 
view these self-reinforcing systems as evolving from naive 
to informed states. Over time, a system "realizes" which 
paths are undesirable and quits them. Consider an in- 
dividual learning to navigate a new city. She will ex- 
perience a similar network evolution, where the pruning 
of dead-end paths is an intentional act. This navigation 
structure also will tend to reflect the cryptic order. 

E. Statistical Complexity versus Crypticity 

In addition to describing the Markov and cryptic or- 
ders via a dynamical picture of synchronization, we can 
explore the same phenomenon with the associated en- 
tropies, a more statistical perspective. 

Beginning with the global view, the distribution over 
the set of all starting points is the state entropy H[So], 
commonly called the statistical complexity C^. By con- 
sidering the initial state distribution conditioned on the 
removal of the terminating paths, we are left with only a 
portion of this entropy, and this is the crypticity x [H] ■ 
As discussed, we might consider this removal a result of 
memory, relaxation, or prescience. 

IV. CRYPTICITY THROUGH INFORMATION 
THEORY 

The discussion above in terms of paths is relatively 
intuitive. The original conception, however, was not 
in terms of paths, but rather in terms of information- 
theoretic quantities. Information identities based on 
e-machines are beginning to provide a growing set of in- 
terpretations; some more subtle, some more direct than 
others. The following will show that crypticity and cryp- 
tic order have diverse implications and also that even 
elementary information-theoretic quantities form a rich 
toolset. 

A. Crypticity 

The e-machine causal presentation pairs up pasts with 
futures in a way appropriate for prediction. Since pasts 
can be different but predictively equivalent, this pairing 
operates on sets of pasts that, in turn, are equivalent to 
the causal states themselves. Furthermore, a single past 



can be followed by a set of futures. This is natural since 
the processes are stochastic. So, any past or predictively 
equivalent group of pasts is linked to a distribution of 
futures. Finally, these future distributions often overlap. 
As we will now show, crypticity is a measure of this over- 
lap. 

Historically, it has taken some time to sort out the simi- 
larities and differences between various measures of mem- 
ory. Eventually, two emerged naturally as key concepts: 
C^, the statistical complexity or information processing 
"size" of the internal mechanism; and E, the excess en- 
tropy, or the apparent (to an observer) amount of past- 
future mutual information. It has been recognized for 
some time [El [20] that is an upper bound on E. The 
strictness of this inequality and the nature of the relation- 
ship between the two, however, was not significantly ex- 
plored until Ref. [S] . The first simple statement 8J about 
crypticity in terms of information-theoretic quantities is 
that it is the quantifiable difference between two predom- 
inant measures of information storage: x — C'^ ^ E. 

Taking this view a bit further, since E is the amount 
of uncertainty in the future that one can reduce through 
study of the past, and is the amount of information 
necessary to do optimal prediction (using a minimal pre- 
dictor), their difference is the amount of modeling over- 
head. One may object that a minimal optimal predictor 
should not require more information than will be made 
use of. In fact it is known that many processes with 
large x have nonunifilar representations that are much 
smaller 0|. What is not obvious is that this is simply a 
re-representation of the causal states as mixtures of the 
new states [IIIIH]. In other words, the overhead is in- 
escapable. This suggests a useful language with which to 
discuss stochastic processes — not only do we identify a 
process with an e-machine, but we analyze the efficiency 
of these machines in terms of required resources. 

For the following, we briefly invoke the use of the re- 
verse e-machine, the causal representation of a process 
when scanned in reverse, to extend our view of crypticity. 
(For details on reverse causal states, see Refs. [H |TT].) 
Recall that forward causal states are built for prediction 
and, similarly, reverse causal states are built for retrod- 
iction. We say they are "built" for these purposes in the 
sense that they are minimal and optimal, two desirable 
design goals. 

Given this, it is somewhat surprising to see that for- 
ward causal states are better at retrodiction than reverse 
causal states. The information diagram in Fig. [5] illus- 
trates this. We will now show that the degree to which 
this is true is precisely the forward process's crypticity. 
Here, we write this difference in retrodictive uncertainty 
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verges to x'- 



H[Xo:] 



FIG. 5. Crypticity as the degree to which forward causal 
states are better retrodictors than reverse causal states. 



as follows: 



H[X^L:o\So] - H[X^L:0\S^] > 



Then, this difference converges to x- 

^L:o\So]-H[X^L:o\S+]) 



X 



lim {H\X^ 



We might wonder why the reverse causal states were 
not built to be better at their job. This is explained 
by the fact that the information input to the above con- 
structs is not equivalent. The forward causal states are 
built from the past, while the reverse causal states are 
built from the future. It is no surprise, then, that the 
forward states can offer information about the pasts from 
which they were built. It is more interesting to consider 
why they do not maintain all of this information. This is 
because the forward states were designed for predicting 
a stochastic process, a goal for which maintaining infor- 
mation about the past offers diminishing returns. 

Rather than comparing the function of two objects 
(forward and reverse causal states), we can compare two 
functions of the same object. In this light, the crypticity 
is the degree to which forward causal states are better at 
retrodiction than they are at prediction. More precisely, 
we have: 

H[Xo:L\So]-H[Xa.,L\SL] 

= H[SoXo;l] — H[Xo:lSl] 
= H[So\Xo.,lSl] - H[Sl\SoXo:l] 
= H[So\Xo:lSl] 
> . 

The first step follows from stationarity, the second ap- 
peals to an informational identity, and the next to unifi- 
larity of the e-machine. Similarly, this difference con- 



lim H[So\Xo:lSl] 



lim H[So\Xo:l] ^ X 

L— >-oo 



Thus, crypticity is the amount of information that, al- 
though necessary for current prediction, must be erased 
at some future time. 



B. Cryptic Order 

Many of these statements about uncertainty can be 
rephrased in terms of length scales. The length scale 
associated with the crypticity is the cryptic order: the 
distance we must look into the past to discover the mod- 
eling overhead. Following our discussion of forward and 
reverse states, we can interpret cryptic order as the length 
at which the difference converges to x- 

k = mm{L : H[X^L:o\So] - H[X^L:o\S^] = x} ■ 

Stated differently, it is the length at which all advantage 
of a forward state over a reverse state as a retrodictor is 
lost. In other words: 

k - min{L : H[Xo\Xi:l+iS++i] = HiXolX^-.L+iSZ+i]} ■ 

Equivalently cryptic order is the length at which a for- 
ward state's uncertainty in prediction and retrodiction 
equalize. More colloquially, it is the range beyond which 
a forward state is equally good at prediction and retrod- 
iction, or: 

k = mm{L : H[Xl\SoXo:l] = H[Xo\Xi.,l+iSl+i]} • 

As Sec. [Ill] suggested, the cryptic order k is closely 
analogous to the Markov order R. Here, we state the 
parallel formally: 

R = mm{L: H[Sl\Xo.,l]=0} 

k = mm{L : H[Sl\Xo:l, Xl:] = 0} . 

Appendix argues for the uniqueness of this parallel. 

Cryptic order is the largest noninferable state se- 
quence. Given an infinite string of measurements 
. . . X-iX-\Xq^ one eventually synchronizes to a particular 
causal state for any finite-state e-machine. The same 
symbol sequence can then be used to retrodict states be- 
ginning at the point of synchronization. All but the ear- 
liest k states can be definitively retrodicted regardless of 
which observed sequence (and resulting predictive state) 
occurs. 
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V. CRYPTICITY AND ENTROPY 
CONVERGENCE 

It has become increasingly clear that entropy functions 
are useful characterizations of processes. Since a process 
is a bi-infinite collection of random variables i23j , it typ- 
ically is not useful to calculate the entropy of the entire 
collection. The alternative strategy is to analyze the en- 
tropy of increasingly large finite portions. The scaling, 
then, captures the system's bulk properties in the large- 
size (thermodynamic) limit, as well as how those proper- 
ties emerge from the individual components. 

These functions capture much of the behavior that 
we are interested in here. The block entropy H[Xo;i,] 
was used to great effect in Ref. [6, to understand the 
way perceived randomness may be reformulated as struc- 
ture, when longer correlations are considered. More 
recently, Ref. [12] used extended functions — the block- 
state entropy HIXq-lTZl] and the state-block entropy 
H[TZoXo-l] — to explore the relationship between alter- 
nate presentations of a process and the information the- 
oretic measures of memory in a presentation. 

We will borrow these two new entropy functions and 
turn them back on the canonical set of presentations, 
e-machines, to expose the workings of crypticity. The 
result is a graphical approach that offers a more intu- 
itive understanding of the results originally developed in 
Ref. [11]. Using this, we sharpen several theorems, dis- 
cover new bounds, and pose additional challenges. 



A. Block Entropy 

The block entropy H[Xq-l] is the joint Shannon en- 
tropy of finite sequences. As it is treated rather thor- 
oughly in Ref. j6j, we simply recall several of its features. 

First, recall that Xq.q represents the random variable 
for a null observation and, since there is just one way to 
do this, H[Xo:o] =0. As L increases, the block entropy 
curve is a nondecreasing, concave function that limits to 
the linear asymptote E + /i^L, where E is the excess 
entropy and /i^ is the process entropy rate. 

Given a block entropy curve, Markov processes are eas- 
ily identified since the curve reaches it linear asymptote 
at finite block length. That is, the Markov order R sat- 
isfies: 

R = min {L : H[Xo:l] = E + /i^L} . 

Before reaching the Markov order, one has not discov- 
ered all process statistics and, so, new symbols appear 
more surprising than they otherwise would. Mathemati- 



cally, this is formulated through a lower bound; 

H[Xl\Xo:l] >K . 

for all L. Since the block entropy curve for Markovian 
processes reaches its asymptote aX L — R and since the 
linear asymptote has slope equal to the entropy rate, 
we know that Markov processes attain the lower bound 
whenever L > R: H[Xl\Xo,l] — /i^. 

Finally, since the block entropy is concave and non- 
decreasing, it is bounded above by its linear asymptote. 
This naturally leads to a concave, nondecreasing lower 
bound estimate for the excess entropy: 

E(L) = H[Xo:l] - h^L . 

Thus, E(L) < E(L + 1) < E and limi^oo E(L) = E. 



B. State-block entropy 

The state-block entropy H\R.qXq.]^] is the joint uncer- 
tainty one has in a presentation's internal state TZ and 
the block of symbols immediately following. Its behavior 
is generally nontrivial, but when restricted to e-machines, 
its behavior is simple [12]. In that case, it refers to 
the process's unknown causal state 5o and is denoted 

Its simplicity is a direct consequence of the causal 
states' efficient encoding of the past. To see this, note 
that differences in the state-block entropy curve, the rate 
at which it grows with block length, are constant: 

H[SQX(i;Lj^i] — H[SqXq;l\ = H[X l\SqXq;l\ 

= H[Xl\So;lXq;l\ 
= H[Xl\Sl] 
= hf_, . 

Here, we used the unifilarity property of e-machines: 
H[Sl+i\Sl,Xl] = 0. So, given the causal state Sq, the 
block Xo:L of symbols immediately following it deter- 
mines each causal state along the way Sq-l. Since causal 
states are sufficient statistics for prediction, the future 
symbol X^ depends only on the most recent causal state 
Sl and, finally, the optimality of e-machines means that 
the next symbol can be predicted at the entropy rate /i^. 

In other words, the state-block entropy that employs 
a process's e-machine presentation is a straight line with 
with slope /ip and y-intercept H[SoXo-o] = H[Sq] = C^. 
Note that HISqXi^.l] > H[Xq.]^] with equality if and only 
if i?[5o|A^o:L] = 0. Since conditioning never increases 
uncertainty, these two block-entropy curves remain equal 
from that point onward. This necessarily implies that 
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they tend to the same asymptote. So, if the state-block 
entropy curve ever equals the block entropy curve, then 
the y-intercepts of each curve must also be equal: = 
E. Stated differently, the two curves meet if and only if 
the process has x = 0. 

C. Block-state entropy 

Finally, we consider the block-state entropy 
H[X()-lTZl], a measure of the joint uncertainty one 
has in a block of symbols and the presentation's sub- 
sequent internal state. Once again, our interest here 
is with e-machines, and so we consider H[Xo-lSl]- 
Unlike the state-block entropy, however, the behavior 
of this entropy is nontrivial. We recall a number of 
its properties and also establish the equivalence of the 
cryptic order definitions given in Refs. [9} I12j. Then, we 
provide a detailed proof of its convexity, as this does not 
appear previously. 

The block-state entropy begins at when L — 0. As 
L increases, the curve is nondecreasing and tends, from 
above, to the same linear asymptote as the block entropy: 
E -I- hf^L. Since the state-block entropy is -f /i^i 
and since > E, we see that the state-block entropy 
curve is greater than or equal to the block-state entropy: 
H[SoXq,l] > H[Xq,lSl]- Equality for L > occurs 
if and only if the process has — E or, equivalently, 
X = and, then, the curves are equal for all L. 

Similarly, the block-state entropy is greater than or 
equal to the block entropy: H[Xq,lSl] > H[Xq,l]. We 
have equality if and only if H[Sl\Xo.l] — 0. Recall, the 
smallest such L is the Markov order R. So, the block- 
state entropy equals the block entropy only at the Markov 
order. Further, once the curves are equal, they remain 
equal: 

H[Xo:lSl] = H[Xa:L] =^ H[Xo:L + iSl + i] = i?[^0:L+l]- 

This can shown by individually expanding both 
H[Xo:l+iSl+i] and H[Xo:l+i] to H[Xo:l] + h^. The 
interpretation is that the two curves become equal only 
at the Markov order and only after both curves have 
reached their linear asymptotes. 

Reference [12] defined the cryptic order as the min- 
imum L for which the block-state entropy reaches its 
asymptote. This is in contrast to the definition provided 
here and also in Ref. [9 , which defines the cryptic order 
as the minimum L for which H[Sl\Xq.] — 0. We now 
establish the equivalence of these two definitions. 

Theorem 1. 

H[Sl\X^.] = ^ H[Xo.,lSl] ^E + h^L . (1) 



Proof. 



H[Sl\Xo:]=0 (2) 

^ H[So\Xo.,]= H[So\Xo.,l,Sl] (3) 

^ I[So;Xo:]=I[Sa;Xo:L,SL] (4) 

^ E = H[Xo.,l,Sl]- H[Xo:l,Sl\So] (5) 
■i==^ H[Xq;l,Sl] 

= -E + H[Sl\So,Xo.,l]+H[Xo:l\So] (6) 

^ H[Xo:L.SL]=E + h^L . (7) 



The step from Eq. ([2| to Eq. ([3| follows from Thm. 1 of 
Ref. In moving from Eq.^^ to Eq. we used the 

prescience of causal states E — I[S{)]Xq-] ^201. Finally, 
Eq. (|6| leads to Eq. ([t]) using unifilarity of e-machines 
(H[Sl\So, Xo:l] = ciiT'd that they allow for prediction 
at the process entropy rate: H[XQ-L\S(j\ — h^L. □ 

We obtain estimates for the crypticity x by considering 
the difference between the state-block and block-entropy 
curves: 

X{L) = H[SoXo.,l] - H[Xo:lSl] (8) 
= h^L - H[Xo:l\Sl] ■ (9) 

Ref. [H] showed that this approximation limits from be- 
low in a nondecreasing manner to the process crypticity: 
x{L) — >■ X and x(^) ^ xi^ + 1) < X- This also provides 
an upper-bound estimate of the excess entropy: 

E < - x{L) . 

Combined with the lower-bound estimate the block en- 
tropy provides, one can be confident in the estimates of 
excess entropy. 

The retrodictive error H[Xq.i^\Sl] is the difference of 
the block-state entropy from the statistical complexity. 
It is also the difference of x(-^) from /i^i. Furthermore, 
it follows from Ref. [12] that the asymptotic retrodiction 
rate [23] is equal to the process entropy rate: 

hm = . 

In a sense, this describes short-term retrodiction. As 
we will see in a moment, order- i? spin-chains are a class 
of processes that have no retrodiction error for a full 
i?-block. The opposite class, in this sense, consists of 
processes with x — — that is, the co-unifilar processes. 
These immediately begin retrodiction at the optimal rate, 
which is /i^. 

Finally, we establish the convexity of the block-state 
entropy, which appears to be new. 

Theorem 2. H[Xo.lSl] is convex upwards in L. 
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H[Xo., 



H[Xl-iSl] 



H[X-i] 




H\Sl-i] 



FIG. 6. Four variable I-diagram for the block-state entropy 
convexity proof, with the needed sigma-algebra atoms appro- 
priately labeled. 



Proof. Convexity here means: 

H[Xo:l+iSl+i] — H[Xo:lSl] 

> H[Xo:lSl] — H[Xq:L-iSl-i] ■ 

Stationarity gives us: 

H[X_i.,lSl] - H[Xo:lSl] 

> H[X^i;l-iSl-i] — H[Xo;l-iSl-i] ■ 

Simplifying, we have: 

H[X_,\Xo..lSl] > H[X_,\Xo..l-iSl-i] ■ 

We can use the I-diagram of Fig. to help understand 
this last convexity statement. There, it translates into: 



or, since a > 0.' 



a + J > a + /3 



7>/3 



(10) 



Using the fact that the causal state is an optimal rep- 
resentation of the past, we have the following expressions 
that are asymptotically equivalent to the entropy rate hf^ : 

H[Xl-iSl\Sl-i]-- P + e + S + C 
H[Xl-iSl\Sl-iXo:l-i] = /3 + C 
H[XL-iSL\SL-iX^i] = e + C 
H[Xl^iSl\Sl-iX^iXo;l^i] — ( . 

The associations with the sigma-algebra atoms are readily 
gleaned from the Tdiagram. Note that the various finite- 
L expressions for the entropy rate rely on the shielding 
property of the causal states and also on the e-machine 's 
unifilarity. Taken together in the L — oo limit, the four 



relations yield: 

C = and 
13 = S = 6 = . 

These, in turn, transform the convexity criterion of 
Eq. [10) into the simple statement that: 



7 > . 

Since 7 = I[X-i\Sl-i\Xq:lSl\ is a conditional mutual 
information and, therefore, positive semidefinite, this es- 
tablishes that the block-state entropy is convex. □ 




3 4 5 

Word Length 

FIG. 7. The block H[Xo:l], state-block H[SoXo:l], and block- 
state entropy H[Xo;lSo] curves compared. The sloped dashed 
line is the asymptote E-|-/i^L, to which both the block entropy 
and state-block entropy asymptote. Finite Markov order and 
finite cryptic order are illustrated by the vertical dashed lines 
that indicate where the entropies meet the linear asymptote, 
respectively. The convergence of the crypticity approximation 
x{L) to X is also shown. 

It will help to summarize the point that we have now 
reached. We used the various block entropy curves to 
synthesize much of our information-theoretic viewpoint 
of a process into a single representation — that shown in 
Fig. [Tj We can amortize the effort to develop this view- 
point by applying it to a broad class of processes familiar 
from statistical mechanics. 



VI. CRYPTICITY IN SPIN CHAINS 

We first consider a subset of processes drawn from sta- 
tistical mechanics known as one-dimensional spin chains. 
(For background, see Refs. [HIISS].) They are processes 
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such that H[Xq;r\ = Cf^. Using this, the simple geometry 
presented in Fig. [7] reveals that: 



xiL) 



0<L<R, 
L> R , 



(11) 



and this, in turn, implies that k ^ R. This can also 
seen through Eq. (|9|. Recall, the block-state entropy is 
nondecreasing and begins at C^. Since spins chains have 
H[Xo:r] = Cfj,, we know that the block-state entropy 
curve for spin chains must remain flat until L = R. Con- 
sequently, H[Xq;l\Sl\ = and x(i) = h^L for L < R. 
Notice that H[XQ.fi\Su] not vanishing gives a way to un- 
derstand how deviates from linear growth. That is, 
the nonlinearity of the approach of x{L) to x is exactly 
the coentropy H[Xo:l\Sl]- 

This property is tantamount to a very simple test to 
determine if a process is a spin chain. If one obtains 
a plot similar to Fig. [7] for the process in question, it 
is a spin chain if H[Xo:l,Sl] goes from (0, C^) flat to 
{R,Cf^), and then follows E -1- Lft.^. That is, the block- 
state entropy curve is flat until it reaches its asymptote 
at L — k — R, at which point it tracks it. 

Furthermore, given (i) the above proof, (ii) the concav- 



ity proof from Sec. V C and (iii) the fact that k < R, for a 



given E, /i^, and R spin chains are seen to be maximally- 
cryptic processes. By this we mean that for all processes 
with a particular set of values for E, ft,^, and R, the pro- 
cess that maximizes x is a spin chain. This implies that 



Cfj, is also maximized. 




FIG. 8. An order-2 Markov spin chain with full support. 

Figures [8] and |9] show two order-2 Markov spin chains. 
The flrst is a full-support order-2 Markov chain, while 
the second has only partial support. In fact, the latter 
process has the Golden Mean support consisting of all 
bi-inflnite sequences that do not contain consecutive Os. 

Figure [To] gives an e-machine of similar structure to the 
spin chains just examined and, while it is also an order- 
2 Markov process, it is not a spin chain. The reason is 
that one causal state (labeled "01, 11") is induced by two 




FIG. 9. An order-2 Markov spin chain with partial support. 




FIG. 10. An order-2 Markov process, but not a spin chain. 



words: 01 and 11. This means that the correspondence 
between inducing-words and causal states is broken. It 
is no longer a spin chain. 

We close this section with a number of open questions 
about spin chains. The flrst two regard the structure of 
spin chains. If an e-machine is a subgraph of an order- 
R Markov skeleton, then is it a spin chain? That is, 
does the removal of an edge from a spin chain produce 
another spin chain? The intuition behind this question is 
straightforward: Removing transitions disallows blocks, 
but it would not cause any block to be associated with a 
different state. A related question asks if all spin chains 
are of this form. 

The next two questions regard the transformation from 
a spin chain to any other process and vice versa. First, 
can any order-i? Markov, order-fc cryptic e-machine be 
obtained by starting with an order- i? Markov skeleton, 
reducing some probabilities to zero and adjusting others 
to cause state merging? Also, given an order- i? Markov, 
order-fc cryptic e-machine, we can break the existing de- 
generacy so that H[Xo.ji] = C^. How does the nonspin 
chain we started with compare with the spin chain we 
end up with? 
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VII. GEOMETRIC CONSTRAINTS 

The geometry of the block entropy convergence illus- 
trated in Fig. [7] can be exploited. In particular, as we 
will now show, a variety of constraints leads to further 
results on the allowed convergence behaviors the block 
and block-state entropy curves can express. Figure [TT] 
depicts these results graphically. 
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FIG. 11. Constraints on entropy convergence, illustrated for a 
process that is order-5 Markov and order-4 cryptic. The blue 
region circumscribes where the block entropy curve can lie; 
the tan, where the block-state entropy may be. These and 
the discreteness of L lead to restrictions on allowed cryptic 
orders as well. 

First, given the block entropy's concavity and 
it's asymptote, one sees that the block entropy 
curve is contained within the triangle described by 
{(0,0),(0,E),(i?,ff[Xo:fl])}. We also know that the 
block entropy cannot grow faster than i?[Xo] and this 
excludes the triangle {(0,0), (0,E), {l,H[Xo])}. The re- 
sulting allowed region is shown in light blue in Fig. [TT] 

Second and similarly, the block-state entropy's own 
properties require it to be within a triangle described 
by {(0,C^),(^,C^),(i?,i/[Xo:fl])}. 

Third, since the entropy functions are defined for 
discrete values of word length L, we can go a lit- 
tle further than these observations. The block- 
state entropy cannot intersect the asymptote E -|- 
/i^L at a noninteger L. Therefore the small triangle 
{{[^J,C,),{^ C,),{\^^,E + \^^h,)} is excluded. 
The resulting allowed trapezoid is displayed in tan in 
Fig. [11} 

Fourth, recalling results on the block-state entropy, 
this exclusion means that processes with 7^ E + /i^fc, 
for some k, must have a degree of nonoptimal retrodic- 
tion. In short, they are prevented from being spin chains. 



Finally, given a process that has cryptic order fc, we 
see that < E -I- h^k. A more detailed result then says 
that — 'E + hf^k ii and only if = C^. Moreover, 

it is Markov order-fc; that is, it is a spin chain. 



VIII. THE CRYPTIC MARKOVIAN ZOO 

It turns out that there exist finite-state processes with 
all combinations of Markov and cryptic order; subject, of 
course, to the constraint that R > k. These range from 
the zero structural complexity independent, identically 
distributed processes, for which R = and k ~ 0, to few- 
state processes where either or both are infinite. (For a 
complementary and exhaustive survey see Ref. [26;.) In 
practice, given what we now know about these properties, 
it is not difficult to design a variety of processes that fulfill 
a given specification. 

Also noteworthy is how the introduction of the new 
crypticity "coordinate" affects our view of several well 
studied examples. For instance, the Even Process [5] is 
one of the canonical finite-state, infinite-order Markov 
processes. In the past, it was often thought of as rep- 
resenting both intractability and compactness. Now, 
though, we see that it is trivial, being 0-cryptic. The 
Golden Mean Process, one of the simplest (order-1 
Markov) subshifts of finite-type studied is now seen as 
more sophisticated, being 1-cryptic. These and similar 
explorations naturally lead one to delve deeper to find ex- 
treme examples — such as the Nemo process below — that 
are infinite in both cryptic and Markov orders. Again, 
see Ref. 

Figure [12] presents a crypticity-Markovity roadmap for 
the space of finite-state processes. Borrowing from the 
immediately preceding citations, it also displays a select 
few processes using their e-machines to show concretely 
the full diversity of possible Markov and cryptic orders a 
finite-state process can possess. The green bar at fc = 
consists of all co-unifilar processes. The orange line con- 
tains all processes where the Markov and cryptic orders 
are identical — a subset of which are the spin chains. All 
other processes lie above this line. The Even Process is in 
the upper left corner. The Golden Mean Process (no con- 
secutive Os) is in the lower left. The 00-cryptic, infinite- 
order Markov Nemo Process is in the upper right corner. 
Several of the other prototype e-machines depicted illus- 
trate {R, A:) -parametrized classes of process for whom the 
Markov and cryptic orders can be selected arbitrarily. 
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FIG. 12. The crypticity-Markovity roadmap for finite-state stationary processes: The range of possible Markov and cryptic 
orders, illustrated by a sample of processes depicted by their e-machines. Lower left: The Fair Coin Process and all other IID 
processes. Upper left: The cxD-cryptic Even Process. Upper right: The Nemo Process. Left vertical (green) line: The co-unifilar 
processes. 



IX. INFORMATION DIAGRAMS FOR 
STATIONARY PROCESSES 

Information diagrams, or simply I-diagrams, are an 
important analysis tool in using information theory to 
analyze multivariate stochastic processes |27j . They are 
particularly useful when working with processes and, as 
we have already seen here, give a good deal of insight 
when the e- machine presentation is employed [51 [TT] . 

The essential idea is that there is a one-to-one cor- 
respondence between information-theoretic quantities — 
mutual information and conditional and joint entropies — 
and measurable sets. Constructively, informational rela- 
tionships and constraints are depicted via set-theoretic 
operations: joint entropies are set unions, conditional en- 
tropies correspond to set difference, mutual information 
corresponds to set intersection, and the like. The math- 



ematical structure is a sigma algebra over the process's 
events (words). The noncomposite sets are the atoms of 
the sigma algebra and their size is the magnitude of the 
corresponding informational quantities. When depicted 
graphically, though, one often ignores magnitudes and, 
instead, focuses on the set-theoretic relationships. 

Armed with simple and familiar rules, one can of- 
ten accomplish several algebraic calculational steps on 
compound entropy expressions via a simple 1-diagram 
and a small description. Perhaps more importantly, I- 
diagrams afford a visual calculus that lends a heightened 
intuition about complicated relationships among random 
variables. 



Figures 13 through 17 show how to make more explicit 
and intuitive the preceding formal views of entropy con- 
vergence and its relationship to Markovity and cryptic- 
ity. The two large circles in each represent the past via 
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H[X;o] and the future via H[Xq,]. The excess entropy 
E = I[X;Q] Xq-], being a mutual information, is their in- 
tersection. The I-diagrams there show the nested de- 
pendence of the various information measures as one in- 
creases block size and so increases the number of random 
variables. In the general multivariate case this would lead 
to an explosion of atoms. However, due to the nature of 
processes and the e-machine itself many simplifications 
are possible. Figures [T4p6| also depict the e-machine's 
causal-state information, = H[S], as a circle entirely 
inside the past H[X-o]. This is so, since the causal states 
are a function of the past. 



H\X. 




FIG. 13. Information diagram for an order-4 Markov process. 
Only the four most recent history symbols are needed to re- 
duce as much uncertainty in the future as using the whole 
past would. 



In this, we see that no fewer than four history symbols 
are required to determine the causal state. Importantly, 
it is now also made explicit that causal states do not 
generally determine this history. 



H[Xo.] 




FIG. 15. An 1-diagram for an order-4 Markov process, but 
order-3 cryptic. Four history symbols are required to deter- 
mine the state, but only three are required if one conditions 
on the future. 



Consider now the order-4 Markov, order-3 cryptic pro- 
cess of Figure [15] As before, four history symbols are 
required to determine the state. But, as depicted, only 
three history symbols are required if one conditions on 
the future as well. 



To start with the simplest case, Fig. 13 gives the I- 
diagram for an order-4 Markov process. As one expects, 
only the four most recent history symbols are needed to 
reduce as much uncertainty in the future as using the 
whole past would. Equivalently, as soon as the history 
contains four symbols, all of the shared information be- 
tween the past and the future (the excess entropy E) is 
captured. 
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FIG. 14. Causal state is overlaid onto an I-diagram for an 
order-4 Markov process. As drawn, no fewer than 4 his- 
tory symbols are required to determine the causal state. The 
causal state, though, does not generally determine this length- 
four history. 

Figure 14 then overlays the causal state measure H[S]. 




H\Xa.] 



FIG. 16. The separation between Markov and cryptic orders 
can be widened: A Markov order-4, cryptic order-2 process. 



Figure 16 demonstrates how the difference between 
Markov and cryptic orders can be increased without 
bound. The I-diagram illustrates the sigma-algebra for 
an order-4 Markov, order-2 cryptic process. 



Finally, Fig. 17 gives the I-diagram for an order-4 spin 
chain. Several features of spin chains are clearly ren- 
dered in this I-diagram. First, the shortest history that 
uniquely determines the state occurs at length 4. Specif- 
ically, as depicted, min^ : -ff [5o|Ar_L:o] = 4. And, at the 
same time, this length-4 history is itself uniquely deter- 
mined by the causal state. 
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C„ = H[S] 




FIG. 17. The highly regular I-diagram for an order-4 spin 
chain. 



X. CONCLUSION 

Crypticity, as the difference between a process's stored 
information and its observed information, is a key prop- 
erty. The fundamental definitions, Eqs. ([I]) and ([2]), 
though, are not immediately transparent. However, they 
do lead to several interpretations that prove useful in dif- 
ferent settings. Given this, our main goals were to expli- 
cate the basic notions behind crypticity and to motivate 
various of its interpretations. Along the way, we pro- 
vided a new geometric interpretation for cryptic order, 
established a number of previously outstanding proper- 
ties, and illustrated crypticity by giving a complete anal- 
ysis for spin chains. 

More specifically, using state-paths, we introduced sev- 
eral new interpretations of crypticity that not only helped 
to explain the basic idea but also suggest future ap- 
plications in distributed dynamical systems. We also 
gave a simple geometric picture that relates cryptic and 
Markov orders. We established the equivalence between 
co-unifilarity and being 0-cryptic, as well as the concav- 
ity of the block-state entropy H[Xq;lSl]- We derived 
several geometric constraints and drew out their impli- 
cations for bounds on crypticity. These also led to an im- 
proved bound on Markov order. Presumably, the bounds 
will help improve estimates of crypticity and cryptic or- 
der, in both the finite and infinite cases. 

To give a sense of the relationship between cryptic and 
Markov orders we gave a graphical overview classifying 
processes in their terms. In a complementary way, we in- 
troduced the technique of foliated information diagrams 
to analyze entropy convergence and Markov and cryp- 
tic orders in terms of Shannon information measures and 
their now block-length-dependent sigma algebra. 

To ground the results in a concrete and familiar class 
of processes we analyzed range- i? ID spin chains in de- 
tail. We established their Markov order and showed that 
the block-state entropy H[Xq.lSl] is flat for spin chains 



and that = Lh^j^, for all L < R. From these prop- 

erties one can determine whether or not a given process 
is representable as a spin chain: Is the i?-block entropy 
equal to the statistical complexity? The properties also 
suggest what the processes in the neighborhood of a spin 
chain look like. 

Finally, by way of making contact with applications 
to physics and computation, we close by briefly out- 
lining the relationship between crypticity and dynam- 
ical irreversibility in physical processes [28]. Consider 
the morph map : iSq — > {-'^'o:}- A process's entropy 
rate controls the prediction uncertainty of this map: 
hfj^ = limL^oo H[Xo:l\So]. Now, consider the state un- 
certainty determined by the inverse of the morph map: 
: Xq. — >■ {So}. This is already familiar. The cryptic- 
ity controls this uncertainty: x = ^^^L^oo H[So\Xq-l]. 
Just as the entropy rate is a process's rate of producing 
information, the crypticity is its rate of information loss 
or, what one can call, a process's information-processing 
irreversibility. And the latter, appropriately adapting 
Landauer's Principle [53], provides a lower bound on the 
energy dissipation required to support a process's irre- 
versible intrinsic computation. We leave the full devel- 
opment of the thermodynamics of intrinsic computation, 
however, to another venue. 
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Appendix A: Why Crypticity? 

There are many ways to assemble information- 
theoretic quantities — more specifically, information mea- 
sures [17]. Why should one care about crypticity and 
cryptic order? What makes them special? We show that 
crypticity stands out among reasonable alternative mea- 
sures by a rather direct comparison. 

It turns out that there are fewer information quan- 
tities than one might expect — at least fewer interesting 
ones — over pasts, futures, and states. Let's limit our- 
selves to quantities that depend on only a finite set of 
objects and require that we look for a "1-parameter flni- 
tization" property, based on block length. In this case, we 
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can make an exhaustive list of the information measures 
and describe each one. The Ust, at first, appears long. 
But this length is illustrative of the fact that crypticity 
and cryptic order really do capture a relatively unique 
process property. Everything else is either trivial, peri- 
odic, or Markov. 

Table |l] presents the list. It was assembled in a direct 
way by systematically writing down alternative expres- 
sions over single variables, pairs of variables and their 
joint and conditional entropy possibilities, over three 
variables, and so on. One could also consider enumer- 
ating only the relevant sigma-algebra atoms. This, how- 
ever, obscures parallels to existing quantities. 

In addition, alternatives such as H{X^L;o\X;o) are not 
included, since they are trivial. Nor were quantities such 
as iJ(Xo|X_i:o) added, although they could be. Quan- 
tities along these lines would needlessly expand the list, 
to little benefit. 

As elsewhere here, we assume the state random vari- 
able denotes a causal state. 



Appendix B: Equivalence of Forward and Reverse 
Restricted State-Paths 

Why are the restricted state-paths the same in the 
forward and backward lattice diagrams of Figs. [3] and 
|4]? Recall that a forward path is allowed if Pr(Xo:L = 
w,Si = fTsliSo — (ta) 7^ 0. Similarly, a backward path 
is allowed when Pr(iSo — (ta,Xq-l — w\Sl = cfb) 0. 
Since both causal states a a and ub have nonzero proba- 
bility by definition of being recurrent, we see that we can 
state both cases as paths for which Pr(iSo = (ja,Xo:L — 
w,Sl = (Jb) 7^ 0. 




FIG. 18. Why forward and backward restricted paths are the 
same. In this figure state-paths are traced back from final 
states. Cf. Fig.|| 



Figure [18] illustrates this by tracing state-paths back- 
ward through the machine starting at each final state. Of 
course, since processes and their e-machines are generally 
not counifilar, there will be splitting in these paths. For 
example, consider the paths that end in state A on a 1. 
A's predecessors on a 1 are states A and E. 

Note that this produces a different initial set of candi- 
date state-paths, when compared with those in light blue 
in Fig.|4] Now, eliminate all paths that do not trace back 



Information Measure 


Property Detected 


H[A] 




H[X.,o] = H[X_L.,o] 


Periodic 


H[Xo:] = H[Xo:l] 


Periodic 


H[A\B] 




H[S\X.,o] = H[S\X^L:o] 


Markov 


H[S\Xo:] = H[S\Xo:l] 


Markov 


H[Xo..\X.,f,] ^ H[Xo..\X_L:o] 


Markov 


H[X.,o\Xo.,] = H[X.,o\Xo.,l] 


Markov 


H[X.,q\S]=H[X^L:o\S] 


Periodic 


H[Xo:\S] = H[Xo:l\S] 


Periodic 


H[A\BC] 




H[S\X.,oXo..] = H[S\X_L:oXo:] 


Cryptic Order 


H[S\X.,oXo:] = H[S\X.,oXo:l] 


Trivial 


H[X.,o\SXo:] = H[X.,o\SXo:l] 


Trivial 


H[Xo:\X.,oS] = H[Xo.,\X^L:oS] 


Trivial 


H[X.,o\SXo..]=H[X_L,o\SXo.] 


Periodic 


H[Xo:\X.,oS] = H[Xo:l\X..oS] 


Periodic 


H[AB] 




H[X.,oS] - H[X^L:oS] 


Periodic 


H[SXo:] = H[SXo.,l] 


Periodic 


H[X,qXq,] = H[X_l,qXq.] 


Periodic 


H[X;0Xq;] = H[X;qX0:L] 


Periodic 


H[AB\C] 




H[X.,oS\Xo.] = H[X_L:oS\Xo:] 


Periodic 


H[SXo:\X.,a]^H[SXo:L\X..o] 


Periodic 


H[X.,oS\Xo..] - H[X.,oS\Xo..l] 


Markov 


H[SXo:\X.,o] = H[SXo:\X^L:o] 


Markov 


H[ABC] 




H[X;oSXo:] = H[X^L;oSXo:] 


Periodic 


H[X:oSXo:] ~ H[X;oSXo;l] 


Periodic 



TABLE I. Alternative information measures over the past, the 
future, and the causal state, when they achieve their limit at 
finite block length L. As seen, almost all are either trivial, 
periodic, or detect the Markov property. Cryptic order stands 
out as unique. 



successfully along the entire word. Fig. 18 shows these 
remaining state-paths in red and we see that they are the 
same as those in Fig. |4] 
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Appendix C: Crypticity and Co-unifilarity 

Here, we explore the equivalence of E = C^,, co- 
unifilarity, and O-crypticity using several results obtained 
in Ref. [Tl]. With a small modification, the latter results 
allow for a more straightforward proof that leads to a 
better understanding of these relations. 

The "forward" argument is that xi^) ~ implies cryp- 
ticity vanishes at all L. First, we recall two results. 

Corollary 6 [TT]: If there exists a fc > 1 for which 
x{k) = 0, then x{j) = for all j > 1. 

Proposition 3 [TT]: limfc_j.oo x{k) = X- 

Combining Cor. 6 and Prop. 3, we have the following: 
If there exists a k > I for which x(fc) = 0, then x{j) = 
for all j > 1 and x = 0. 

The "backward" argument is that vanishing in the 



limit implies crypticity vanishes at all L. 

Since x(^) is nonnegative (conditional entropy) and 
nondecreasing (Prop. 2 illj) and limits to x (Prop. 3 
[IT]), we have that x — implies x{k) — 0, for all fc > 0. 

All that remains is to recall that co-unifilarity is iden- 
tical to x(l) = and this establishes the desired chain of 
implications: 

Co-unifilar x(l) =0 

3 fc > 1 : x{k) = 
^ x(fc) = 0,Vfc>0 
^ X-0 
<S=> 0-cryptic . 

The heart of the result falls in the middle. It shows us 
that any nontrivial zero in x(^) is equivalent to the entire 
function, as well as x itself, vanishing. 
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